Working with JSON documents

JSON documents are very popular, especially when it comes to API responces and/or financial data. They provide nice, dictionary-like interface to data with the opportunity of working with keys rather than indecies only. Thus, Python has a built-in support for JSON documents with necessary ready-made functions. To access those functions, one needs to import the JSON library, which comes directly installed with Python.



In [3]:

    
import json

Let's create a sample JSON fie and save it to some variable called input.



In [4]:

    
input = '''[
  { "id" : "01",
    "status" : "Instructor",
    "name" : "Hrant"
  } ,
  { "id" : "02",
    "status" : "Student",
    "name" : "Jimmy"
  }
]'''

As you can see here, our JSON documents is nothing else than a list of two dictionaires with 3 keys each (and a value for each key). To parse it as a usual Python object (list in this case), the loads() function from the json package is used.



In [5]:

    
# parse/load string
data = json.loads(input)
# data is a usual list



In [6]:

    
type(data)









    Out[6]:





list



In [7]:

    
print(data)









    



[{u'status': u'Instructor', u'id': u'01', u'name': u'Hrant'}, {u'status': u'Student', u'id': u'02', u'name': u'Jimmy'}]



In [8]:

    
from pprint import pprint



In [9]:

    
pprint(data)









    



[{u'id': u'01', u'name': u'Hrant', u'status': u'Instructor'},
 {u'id': u'02', u'name': u'Jimmy', u'status': u'Student'}]



In [10]:

    
print 'User count:', len(data), "\n"









    



User count: 2



In [11]:

    
data[0]['name']









    Out[11]:





u'Hrant'



In [12]:

    
for element in data:
    print 'Name: ', element['name']
    print 'Id: ', element['id']
    print 'Status: ', element['status'], "\n"









    



Name:  Hrant
Id:  01
Status:  Instructor 

Name:  Jimmy
Id:  02
Status:  Student

Reading JSON from a file

using Pandas



In [13]:

    
import pandas as pd



In [14]:

    
address = "C:\Data_scraping\JSON\sample_data.json"



In [15]:

    
my_json_data = pd.read_json(address)



In [16]:

    
my_json_data.head()









    Out[16]:






  
    
      
      id
      name
      status
    
  
  
    
      0
      1
      Hrant
      Instructor
    
    
      1
      2
      Jack
      Student

using with open()



In [17]:

    
import json



In [18]:

    
with open(address,"r") as file:
    local_json = json.load(file)



In [19]:

    
print(local_json)









    



[{u'status': u'Instructor', u'id': u'01', u'name': u'Hrant'}, {u'status': u'Student', u'id': u'02', u'name': u'Jack'}]



In [20]:

    
type(local_json)









    Out[20]:





list



In [21]:

    
pprint(local_json)









    



[{u'id': u'01', u'name': u'Hrant', u'status': u'Instructor'},
 {u'id': u'02', u'name': u'Jack', u'status': u'Student'}]

Writing JSON files



In [22]:

    
with open('our_json_w.json', 'w') as output:
    json.dump(local_json, output)

Yet, as you may have already noticed, the saved JSON files not human-readible. To make them more user friendly, we may sort the Keys and provide 4-tab indentation.



In [23]:

    
with open('our_json_w.json', 'w') as output:
    json.dump(local_json, output, sort_keys = True, indent = 4)

Converting JSON to CSV



In [25]:

    
import csv, json
address = "C:\Data_scraping\JSON\sample_data.json"
with open(address,"r") as file:
    local_json = json.load(file)

with open("from_json.csv", "w") as f:
    writer = csv.writer(f)
    writer.writerow(["ID","Name","Status"])
    for item in local_json:
        writer.writerow([item['id'],item['name'],item['status']])